Signal Boosting for Translingual Topic Tracking Document Expansion and n best Translation

نویسنده

  • Douglas W. Oard
چکیده

The University of Maryland participated in the TDT topic tracking task This chapter describes the system architec ture including source dependent normalization and then fo cuses on the cross language case in which English training stories were used to nd Mandarin stories on the same topic Processes that may introduce noise including errorful trans lation and transcription are described and ve techniques for minimizing the impact of a reduced signal to noise ratio are identi ed Three techniques focus on signal boosting augmenting story representations with topically related ter minology through document expansion exploiting knowl edge of alternative translations using balanced n best term translation and enriching the bilingual term list to improve translation coverage The remaining two techniques focus on noise reduction removing common stopwords before translation and using corpus statistics to guide translation selection Two of the signal boosting strategies yielded sub stantial gains using techniques that can be ported to other languages fairly easily while outperforming state of the art general purpose machine translation By contrast neither of the noise reduction strategies produced signi cant improve ments The chapter concludes with a brief discussion of fu ture research directions suggested by these results

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Translingual Topic Tracking Applying Lessons from the MEI Project

The University of Maryland participated in the topic tracking task submitting four runs for the required conditions ba sic and challenge In this working notes paper we present preliminary results based on those runs and six additional contrastive runs that explored translation selection post translation resegmentation post transcription document ex pansion and source dependent normalization One...

متن کامل

Translingual Information Retrieval: Learning from Bilingual Corpora (ai Journal Special Issue: Best of Ijcai-97)

Translingual information retrieval (TLIR) consists of providing a query in one language and searching document collections in one or more diierent languages. This paper introduces new TLIR methods and reports on comparative TLIR experiments with these new methods and with previously reported ones in a realistic setting. Methods fall into two categories: query translation and statistical-IR appr...

متن کامل

Multi-scale-audio indexing for translingual spoken document retrieval

MEI (Mandarin-English Information) is an English-Chinese crosslingual spoken document retrieval (CL-SDR) system developed during the Johns Hopkins University Summer Workshop 2000. We integrate speech recognition, machine translation, and information retrieval technologies to perform CL-SDR. MEI advocates a multi-scale paradigm, where both Chinese words and subwords (characters and syllables) ar...

متن کامل

Translingual Document Representations from Discriminative Projections

Representing documents by vectors that are independent of language enhances machine translation and multilingual text categorization. We use discriminative training to create a projection of documents from multiple languages into a single translingual vector space. We explore two variants to create these projections: Oriented Principal Component Analysis (OPCA) and Coupled Probabilistic Latent ...

متن کامل

Translingual Information Retrieval: Learning from Bilingual Corpora

Translingual information retrieval (TLIR) consists of providing a query in one language and searching document collections in one or more diierent languages. This paper introduces new TLIR methods and reports on comparative TLIR experiments with these new methods and with previously reported ones in a realistic setting. Methods fall into two categories: query translation and statistical-IR appr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002